AITopics | cross attention

Collaborating Authors

cross attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement Tao Y ang

Neural Information Processing SystemsNov-19-2025, 22:14:03 GMT

InfoGAN-CR [23]), along with others [37, 28], have been proposed to advance this field further. This work was done during internship at Microsoft Research Asia.

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Integrating Genomics into Multimodal EHR Foundation Models

Amar, Jonathan, Liu, Edward, Breschi, Alessandra, Zhang, Liangliang, Kheradpour, Pouya, Li, Sylvia, Lehmann, Lisa Soleymani, Giulianelli, Alessandro, Edwards, Matt, Jia, Yugang, Nola, David, Mani, Raghav, Vats, Pankaj, Tetreault, Jesse, Chen, T. J., McLean, Cory Y.

arXiv.org Artificial IntelligenceNov-18-2025

This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model's predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture's versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.23639

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Consumer Health (0.89)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin

Guan, Lin, Yang, Jia-Qi, Zhao, Zhishan, Zhang, Beichuan, Sun, Bo, Luo, Xuanyuan, Ni, Jinan, Li, Xiaowen, Qi, Yuhang, Fan, Zhifang, Wang, Hangyu, Chen, Qiwei, Cheng, Yi, Zhang, Feng, Yang, Xiao

arXiv.org Artificial IntelligenceNov-11-2025

Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets. We present an end-to-end system that scales long-sequence modeling to 10k-length histories in production. First, we introduce Stacked Target-to-History Cross Attention (STCA), which replaces history self-attention with stacked cross-attention from the target to the history, reducing complexity from quadratic to linear in sequence length and enabling efficient end-to-end training. Second, we propose Request Level Batching (RLB), a user-centric batching scheme that aggregates multiple targets for the same user/request to share the user-side encoding, substantially lowering sequence-related storage, communication, and compute without changing the learning objective. Third, we design a length-extrapolative training strategy -- train on shorter windows, infer on much longer ones -- so the model generalizes to 10k histories without additional training cost. Across offline and online experiments, we observe predictable, monotonic gains as we scale history length and model capacity, mirroring the scaling law behavior observed in large language models. Deployed at full traffic on Douyin, our system delivers significant improvements on key engagement metrics while meeting production latency, demonstrating a practical path to scaling end-to-end long-sequence recommendation to the 10k regime.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.06077

Country: North America > United States (0.30)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (0.46)
Energy (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement Tao Y ang

Neural Information Processing SystemsOct-10-2025, 10:21:57 GMT

InfoGAN-CR [23]), along with others [37, 28], have been proposed to advance this field further. This work was done during internship at Microsoft Research Asia.

diffusion model, encdiff, representation, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

Gao, Yifei, Chen, Yong, Zhang, Chen

arXiv.org Artificial IntelligenceOct-2-2025

Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional-regression benchmarks show that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily sampled discrete observations of functions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.00621

Country: North America > United States (0.68)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

ACA-Net: Future Graph Learning for Logistical Demand-Supply Forecasting

Shi, Jiacheng, Wei, Haibin, Wang, Jiang, Xu, Xiaowei, Du, Longzhi, Jiang, Taixu

arXiv.org Artificial IntelligenceSep-3-2025

Logistical demand-supply forecasting that evaluates the alignment between projected supply and anticipated demand, is essential for the efficiency and quality of on-demand food delivery platforms and serves as a key indicator for scheduling decisions. Future order distribution information, which reflects the distribution of orders in on-demand food delivery, is crucial for the performance of logistical demand-supply forecasting. Current studies utilize spatial-temporal analysis methods to model future order distribution information from serious time slices. However, learning future order distribution in online delivery platform is a time-series-insensitive problem with strong randomness. These approaches often struggle to effectively capture this information while remaining efficient. This paper proposes an innovative spatiotemporal learning model that utilizes only two graphs (ongoing and global) to learn future order distribution information, achieving superior performance compared to traditional spatial-temporal long-series methods. The main contributions are as follows: (1) The introduction of ongoing and global graphs in logistical demand-supply pressure forecasting compared to traditional long time series significantly enhances forecasting performance.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.01997

Country: Asia > China (0.29)

Genre: Research Report (0.40)

Industry:

Transportation (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.96)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

HyperSteer: Activation Steering at Scale with Hypernetworks

Sun, Jiuding, Baskaran, Sidharth, Wu, Zhengxuan, Sklar, Michael, Potts, Christopher, Geiger, Atticus

arXiv.org Artificial IntelligenceJun-5-2025

Steering language models (LMs) by modifying internal activations is a popular approach for controlling text generation. Unsupervised dictionary learning methods, e.g., sparse autoencoders, can be scaled to produce many steering vectors, but lack guarantees on the individual efficacy of each vector and control over the coverage of relevant steering tasks. In contrast, supervised methods for constructing steering vectors are targeted and effective, but require more data collection and training for each additional steering vector produced. In this work, we introduce HyperSteer, a family of hypernetwork-based architectures which are trained end-to-end to generate steering vectors conditioned on the natural language steering prompts and the internals of the steered LM. In our evaluations, we show that scaling HyperSteer with thousands of steering prompts exceeds the performance of state-of-the-art activation steering methods, even on steering prompts never seen during training. Moreover, HyperSteer performs on par with steering-via-prompting.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.03292

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Industry: Education > Health & Safety > School Nutrition (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Neural Information Processing SystemsMay-27-2025, 09:38:02 GMT

Disentangled representation learning strives to extract the intrinsic factors within the observed data. Factoring these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention itself can serve as a powerful inductive bias to facilitate the learning of disentangled representations. We propose to encode an image into a set of concept tokens and treat them as the condition of the latent diffusion model for image reconstruction, where cross attention over the concept tokens is used to bridge the encoder and the U-Net of the diffusion model. We analyze that the diffusion process inherently possesses the time-varying information bottlenecks.

diffusion model, disentanglement, representation, (5 more...)

Neural Information Processing Systems

Country: Asia > China > Guangxi Province > Nanning (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Object Isolated Attention for Consistent Story Visualization

Luo, Xiangyang, Cheng, Junhao, Xie, Yifan, Zhang, Xin, Feng, Tao, Liu, Zhou, Ma, Fei, Yu, Fei

arXiv.org Artificial IntelligenceMar-30-2025

Open-ended story visualization is a challenging task that involves generating coherent image sequences from a given storyline. One of the main difficulties is maintaining character consistency while creating natural and contextually fitting scenes--an area where many existing methods struggle. In this paper, we propose an enhanced Transformer module that uses separate self attention and cross attention mechanisms, leveraging prior knowledge from pre-trained diffusion models to ensure logical scene creation. The isolated self attention mechanism improves character consistency by refining attention maps to reduce focus on irrelevant areas and highlight key features of the same character. Meanwhile, the isolated cross attention mechanism independently processes each character's features, avoiding feature fusion and further strengthening consistency. Notably, our method is training-free, allowing the continuous generation of new characters and storylines without re-tuning. Both qualitative and quantitative evaluations show that our approach outperforms current methods, demonstrating its effectiveness.

diffusion model, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.23353

Country: Asia > China > Guangdong Province > Shenzhen (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

PanopticSplatting: End-to-End Panoptic Gaussian Splatting

Xie, Yuxuan, Yu, Xuan, Jiang, Changjian, Mao, Sitong, Zhou, Shunbo, Fan, Rui, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceMar-23-2025

Open-vocabulary panoptic reconstruction is a challenging task for simultaneous scene reconstruction and understanding. Recently, methods have been proposed for 3D scene understanding based on Gaussian splatting. However, these methods are multi-staged, suffering from the accumulated errors and the dependence of hand-designed components. To streamline the pipeline and achieve global optimization, we propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction. Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association in an end-to-end way. The local cross attention within view frustum effectively reduces the training memory, making our model more accessible to large scenes with more Gaussians and objects. In addition, to address the challenge of noisy labels in 2D pseudo masks, we propose label blending to promote consistent 3D segmentation with less noisy floaters, as well as label warping on 2D predictions which enhances multi-view coherence and segmentation accuracy. Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets, compared with both NeRF-based and Gaussian-based panoptic reconstruction methods. Moreover, PanopticSplatting can be easily generalized to numerous variants of Gaussian splatting, and we demonstrate its robustness on different Gaussian base models.

artificial intelligence, gaussian, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.18073

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback